55 research outputs found

    Sparse Conformal Predictors

    Get PDF
    Conformal predictors, introduced by Vovk et al. (2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. In the present paper, we propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if their number is very large. Our approach is based on combining the principle of conformal prediction with the ℓ1\ell_1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ϵ>0\epsilon>0 and has a coverage probability larger than or equal to 1−ϵ1-\epsilon. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated data

    Transductive versions of the LASSO and the Dantzig Selector

    Get PDF
    We consider the linear regression problem, where the number pp of covariates is possibly larger than the number nn of observations (xi,yi)i≤i≤n(x_{i},y_{i})_{i\leq i \leq n}, under sparsity assumptions. On the one hand, several methods have been successfully proposed to perform this task, for example the LASSO or the Dantzig Selector. On the other hand, consider new values (xi)n+1≤i≤m(x_{i})_{n+1\leq i \leq m}. If one wants to estimate the corresponding yiy_{i}'s, one should think of a specific estimator devoted to this task, referred by Vapnik as a "transductive" estimator. This estimator may differ from an estimator designed to the more general task "estimate on the whole domain". In this work, we propose a generalized version both of the LASSO and the Dantzig Selector, based on the geometrical remarks about the LASSO in pr\'evious works. The "usual" LASSO and Dantzig Selector, as well as new estimators interpreted as transductive versions of the LASSO, appear as special cases. These estimators are interesting at least from a theoretical point of view: we can give theoretical guarantees for these estimators under hypotheses that are relaxed versions of the hypotheses required in the papers about the "usual" LASSO. These estimators can also be efficiently computed, with results comparable to the ones of the LASSO

    Consistency of plug-in confidence sets for classification in semi-supervised learning

    Full text link
    Confident prediction is highly relevant in machine learning; for example, in applications such as medical diagnoses, wrong prediction can be fatal. For classification, there already exist procedures that allow to not classify data when the confidence in their prediction is weak. This approach is known as classification with reject option. In the present paper, we provide new methodology for this approach. Predicting a new instance via a confidence set, we ensure an exact control of the probability of classification. Moreover, we show that this methodology is easily implementable and entails attractive theoretical and numerical properties

    Sparse conformal predictors: SCP

    Get PDF
    Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the ℓ 1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real dat

    How Correlations Influence Lasso Prediction

    Full text link
    We study how correlations in the design matrix influence Lasso prediction. First, we argue that the higher the correlations are, the smaller the optimal tuning parameter is. This implies in particular that the standard tuning parameters, that do not depend on the design matrix, are not favorable. Furthermore, we argue that Lasso prediction works well for any degree of correlations if suitable tuning parameters are chosen. We study these two subjects theoretically as well as with simulations

    On Lasso refitting strategies

    Full text link
    A well-know drawback of l_1-penalized estimators is the systematic shrinkage of the large coefficients towards zero. A simple remedy is to treat Lasso as a model-selection procedure and to perform a second refitting step on the selected support. In this work we formalize the notion of refitting and provide oracle bounds for arbitrary refitting procedures of the Lasso solution. One of the most widely used refitting techniques which is based on Least-Squares may bring a problem of interpretability, since the signs of the refitted estimator might be flipped with respect to the original estimator. This problem arises from the fact that the Least-Squares refitting considers only the support of the Lasso solution, avoiding any information about signs or amplitudes. To this end we define a sign consistent refitting as an arbitrary refitting procedure, preserving the signs of the first step Lasso solution and provide Oracle inequalities for such estimators. Finally, we consider special refitting strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful property to converge to the Sign-Least-Squares refitting (Least-Squares with sign constraints), which provides with greater interpretability. We additionally study the Bregman Lasso refitting in the case of orthogonal design, providing with simple intuition behind the proposed method. Boosted Lasso, in contrast, considers information about magnitudes of the first Lasso step and allows to develop better oracle rates for prediction. Finally, we conduct an extensive numerical study to show advantages of one approach over others in different synthetic and semi-real scenarios.Comment: revised versio

    The Smooth-Lasso and other â„“1+â„“2\ell_1+\ell_2-penalized methods

    Full text link
    We consider a linear regression problem in a high dimensional setting where the number of covariates pp can be much larger than the sample size nn. In such a situation, one often assumes sparsity of the regression vector, \textit i.e., the regression vector contains many zero components. We propose a Lasso-type estimator β^Quad\hat{\beta}^{Quad} (where 'QuadQuad' stands for quadratic) which is based on two penalty terms. The first one is the ℓ1\ell_1 norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net β^EN\hat{\beta}^{EN}, which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso β^SL\hat{\beta}^{SL}, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that β^Quad\hat{\beta}^{Quad} achieves a Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso β^SL\hat{\beta}^{SL} performs better than known methods as the Lasso, the Elastic-Net β^EN\hat{\beta}^{EN}, and the Fused-Lasso with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', \textit i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance

    Generalization of l1 constraints for high dimensional regression problems

    Full text link
    We focus on the high dimensional linear regression Y∼N(Xβ∗,σ2In)Y\sim\mathcal{N}(X\beta^{*},\sigma^{2}I_{n}), where \beta^{*}\in\mathds{R}^{p} is the parameter of interest. In this setting, several estimators such as the LASSO and the Dantzig Selector are known to satisfy interesting properties whenever the vector β∗\beta^{*} is sparse. Interestingly both of the LASSO and the Dantzig Selector can be seen as orthogonal projections of 0 into \mathcal{DC}(s)=\{\beta\in\mathds{R}^{p},\|X'(Y-X\beta)\|_{\infty}\leq s\} - using an ℓ1\ell_{1} distance for the Dantzig Selector and ℓ2\ell_{2} for the LASSO. For a well chosen s>0s>0, this set is actually a confidence region for β∗\beta^{*}. In this paper, we investigate the properties of estimators defined as projections on DC(s)\mathcal{DC}(s) using general distances. We prove that the obtained estimators satisfy oracle properties close to the one of the LASSO and Dantzig Selector. On top of that, it turns out that these estimators can be tuned to exploit a different sparsity or/and slightly different estimation objectives
    • …
    corecore